Block-Regularized m × 2 Cross-Validated Estimator of the Generalization Error

نویسندگان

  • Ruibo Wang
  • Yu Wang
  • Jihong Li
  • Xingli Yang
  • Jing Yang
چکیده

A cross-validation method based on [Formula: see text] replications of two-fold cross validation is called an [Formula: see text] cross validation. An [Formula: see text] cross validation is used in estimating the generalization error and comparing of algorithms' performance in machine learning. However, the variance of the estimator of the generalization error in [Formula: see text] cross validation is easily affected by random partitions. Poor data partitioning may cause a large fluctuation in the number of overlapping samples between any two training (test) sets in [Formula: see text] cross validation. This fluctuation results in a large variance in the [Formula: see text] cross-validated estimator. The influence of the random partitions on variance becomes serious as [Formula: see text] increases. Thus, in this study, the partitions with a restricted number of overlapping samples between any two training (test) sets are defined as a block-regularized partition set. The corresponding cross validation is called block-regularized [Formula: see text] cross validation ([Formula: see text] BCV). It can effectively reduce the influence of random partitions. We prove that the variance of the [Formula: see text] BCV estimator of the generalization error is smaller than the variance of [Formula: see text] cross-validated estimator and reaches the minimum in a special situation. An analytical expression of the variance can also be derived in this special situation. This conclusion is validated through simulation experiments. Furthermore, a practical construction method of [Formula: see text] BCV by a two-level orthogonal array is provided. Finally, a conservative estimator is proposed for the variance of estimator of the generalization error.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Ridge Estimator in Linear Measurement Error Model with Stochastic Linear Restrictions

In this paper, we propose a new ridge-type estimator called the new mixed ridge estimator (NMRE) by unifying the sample and prior information in linear measurement error model with additional stochastic linear restrictions. The new estimator is a generalization of the mixed estimator (ME) and ridge estimator (RE). The performances of this new estimator and mixed ridge estimator (MRE) against th...

متن کامل

Trading Variance Reduction with Unbiasedness: The Regularized Subspace Information Criterion for Robust Model Selection in Kernel Regression

A well-known result by Stein (1956) shows that in particular situations, biased estimators can yield better parameter estimates than their generally preferred unbiased counterparts. This letter follows the same spirit, as we will stabilize the unbiased generalization error estimates by regularization and finally obtain more robust model selection criteria for learning. We trade a small bias aga...

متن کامل

A New Meta-Criterion for Regularized Subspace Information Criterion

In order to obtain better generalization performance in supervised learning, model parameters should be determined appropriately, i.e., they should be determined so that the generalization error is minimized. However, since the generalization error is inaccessible in practice, the model parameters are usually determined so that an estimator of the generalization error is minimized. The regulari...

متن کامل

Model selection for linear classifiers using Bayesian error estimation

Regularized linear models are important classification methods for high dimensional problems, where regularized linear classifiers are often preferred due to their ability to avoid overfitting. The degree of freedom of the model is determined by a regularization parameter, which is typically selected using counting based approaches, such as K-fold cross-validation. For large data, this can be v...

متن کامل

An Eecient Method to Estimate Bagging's Generalization Error

Bagging [1] is a technique that tries to improve a learning algorithm's performance by using bootstrap replicates of the training set [5, 4]. The computational requirements for estimating the resultant generalization error on a test set by means of cross-validation are often prohibitive for leave-one-out cross-validation one needs to train the underlying algorithm on the order of m times, where...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Neural computation

دوره 29 2  شماره 

صفحات  -

تاریخ انتشار 2017